Here we present a summary of processing steps on WASH dataset.
Three WASH variables were created as per WHO definition (Damazo). See codebook for variable labels.
cat_watersource
cat_toilettype
cat_garbagedisposal
For every variable, cases with NIU or Missing: Impute were recoded to NA.
Cases which had NA in Gender and Age were completely dropped.
Combined smaller groups to others.
A composite variable was created from the three was variables using logistic PCA.
Centered the Household total expenditure.
For every case, we summed the number of WASH indicators the had access to (max = 3) and calculated the proportion (No sure how to call this rate) Is it possible to model the total as poisson process?
Visualization plots for individual WASH were created but initial modelling is on composite WASH variable.
We also present the result from Generalized Linear Mixed-effect Model using lme4 package (glmer).
Use scoring approaches e.g., PCA to create composite WASH variable and then apply GLMM.
Apply multivariate mixed models; either using pseudo multivariate approach in (glmer) or use other approaches proposed by Samuel.
Assume equal weights for each of the WASH indicator variables and model as a count data. We could use Poisson or Negative Binomial.
Model them separately.
Any other suggestions?
The table below summarizes the proportion of missingness for all the variables.
We begin by showing the distribution of individual WASH variables (indicators) over time and space (slum area). Thereafter, we show the distribution of demographic, social and economic variables, of interest, based on composite WASH variable.
In order to gain some understanding before engaging into a more complex mode, we simulated ‘fake’ response variable.
We then ran \(1000\) simulations and for each simulation, calculate:
‘Fake’ y
y = rbinom(n, 1, p)Average ‘Fake’ y to obtain the proportion of 1s generated
The first model, separately, fitted each of the wash variable:
\[\begin{align} single\_wash\_var &\sim intvwyear + slumarea + ageyears\\ & + gender + ethnicity + numpeople\_total + isbelowpovertyline\\ & + wealthquintile + expend\_total\_USD\_per\_centered \end{align}\]
The second model: Restructured the data into long format fitted the model on a single indicator variable.
\[\begin{align} wash\_indicator &\sim (intvwyear + slumarea + ageyears\\ & + gender + ethnicity + numpeople\_total + isbelowpovertyline\\ & + wealthquintile + expend\_total\_USD\_per\_centered) * wash\_variable_label \end{align}\]
We then used the estimated \(\beta\) (coeffs.) to simulate ‘fake’ y.